European Socer players attributes comparison, classified based on FIFA ratings

Author

Sujan Bhattarai

Published

March 1, 2024

Restating Question

My question compared to what I proposed earlier has changed since I could not find number of goals scored by players in the dataset. I will be comparing the attributes of players based on their fifa overall ratings. The overall ratings for all players is provided in the dataset, which is used to categorzie players as Advanced, Intermediate and Novice players. Players greater than 85 overall ratings is defineds as advanced, players with ratings between 70 and 85 are intermediate, and players with ratings less than 70 are novice.

The overarching question is: - How player attributes differs among players within the categories ?

Subquestions are: 1. Is there a difference in median age between advanced, intermediate, and novice players? 2. How does the mean of common attributes like passing, volley, speed, etc differs among these (advanced, intermediate and novice) players ? 3. where does these clusters players stand in terms of combined attributes of dribbling, finishing and strength, which is one of the key attributes for scoring goals in soccer?

which variables to use in the analysis ?

  • I will be using overall rating columns to classify playes as Advanced, Intermediate and novice players. This will help in comparing players attributes: crossing, finishing, dribbling, agility, aggression, balance, strength. There is no gold standard to say that players with ratings above 80 are advanced. For the sake of comparison and just as of personal preference, I set that category. When categorizing players that way, there are more than 4000 and 6000 players in intermediate and novice category respectively, but only around 350 in advanced category. So, I randomly sampled 1000 from both intermediate and novice and included everything from advaned category.

I have selected birthdate column from the dataset and becuase the soccer season was untill 2016, i used 2016 as base year to calculate the age of the players. So, when I say median age of players, it is the median age of players in 2016. This information will be included in infographics to avoid misinformation.

Handwritten infographic template

knitr::include_graphics("data/template.jpg")

Resource i have used

#insert link here - radar chart - ternary_plots

# load the packages
library(dplyr)
library(RSQLite)
library(ggplot2)
library(ggradar)
library(grid)
library(magick)
library(ggtern)
library(tidyverse)
library(showtext)
library(patchwork)
library(janitor)
library(glue)
library(ggtext)
library(geofacet)
library(cropcircles)
library(ggpath)
library(magick)
library(readr)
library(scales)
library(jpeg)

# create a connection to the database
soccer_dataset <- dbConnect(RSQLite::SQLite(), "data/soccer_data/database.sqlite")

#get the player attributes data for all european players
attr<- dbGetQuery(soccer_dataset, "SELECT * FROM Player_Attributes")

# get the player data for all european players
player<- dbGetQuery(soccer_dataset, "SELECT * FROM Player")


# combine player and player_attributes dataframes based on primary key player_api_id
player_attr <- inner_join(player, attr, 
                          by = c("player_api_id" = "player_api_id"), 
                          suffix = c("_player", "_player_attributes"))


# filter the unique players from the player_attr dataset
player_attr <- player_attr %>% distinct(player_api_id, .keep_all = TRUE)
#---this information is not from a source and based on personal preference
#---classify the players based on their overall rating
player_attr <- player_attr %>% 
  mutate(player_class = ifelse(overall_rating >= 85, "Advanced", 
                                 ifelse(overall_rating >= 70 & overall_rating < 85, "Intermediate", "Novice")))

My required variables of interests are:

# filter required columns from the dataset
player_data <- player_attr %>% 
        select(player_name, player_class, crossing, finishing, 
               dribbling, agility, aggression, balance, strength, stamina)
#---filter only good players from the player_data datase
good_players <- player_data %>% filter(player_class == "Advanced")

#---filter average and bad players from the player_data dataset
set.seed(123)
average_bad_players <- player_data %>% filter(player_class != "Advanced") %>% 
  #sample only 1000 from average and bad players
  group_by(player_class) %>% 
  #randomly select 1000 players from each class and always include the player Van dijk in the sample
  sample_n(1000)

#good defender van dijk
van_dijk <- player_data %>% filter(player_name == "Virgil van Dijk")

#---combine the good_players and average_bad_players datasets
clean_player_data <- bind_rows(good_players, average_bad_players, van_dijk)

comparison of common attributes among players categories

#summarize the data and plot the summary output with radar chart
radar_data <- clean_player_data %>% 
  group_by(player_class) %>% 
  select(-player_name,  player_class) %>%
  summarise_all(mean, na.rm = TRUE) %>% 
  arrange(ifelse(player_class == "Advanced", 1, ifelse(player_class == "Intermediate", 2, 3)))

#create custom color paletteA
colors <- c("#7f58AF", "#64C5EB", "#E84D8A", "#FEB326", "lightblue")

radar_plot <- ggradar(radar_data,
        grid.min = 0,
        grid.max =  100,
        grid.mid = 50,
        axis.label.size = 14,
        label.centre.y = F,
        group.line.width = 0.8,
        gridline.mid.colour = "grey",
        group.point.size  = 2,
        grid.label.size = 14,
        group.colours = colors,
        background.circle.colour = "white",
        legend.title  = "Players proficiency",
        legend.text.size = 24) +
  
  theme(legend.title = element_text(size = 18))+
  #change legend text size
  theme(legend.text = element_text(size = 18))+
  #change caption size
  theme(plot.caption = element_text(size = 18))+
  #make the background theme white
  theme_minimal() +
  #remove y axis label
  theme(axis.text.y = element_blank())+
  #revmoe x axis label
  theme(axis.text.x = element_blank())+
  #remove all grid lines 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+
  #remove legend
  theme(legend.position = "none")

#save this as png
ggsave(plot = radar_plot, filename = "radar_plot.png", height = 5, width = 5)

#read the image and include it in the rmd
knitr::include_graphics("static_radar.png")

comparison of three major attributes of soccer player

#standarize the clean_player_data for all numeric columns based on min and max value
player_standarized <- clean_player_data %>% 
  mutate(across(where(is.numeric), ~scales::rescale(.x, to = c(0, 1))))

# Plot the ternary plot for three attributes: strength, aggression, and balance
tern_plot <- 
  ggtern(player_standarized, aes(x = finishing, y = dribbling, z = strength, color = player_class)) +
  geom_point(alpha = 0.3) +
  theme_rgbw() +
  scale_color_manual(values = colors) +
  # Highlight all points with advanced players
  geom_point(data = player_standarized %>% filter(player_class == "Advanced"), 
              aes(x = finishing, y = dribbling, z = agility)) +
  # Change legend title
  labs(color = "Players proficiency") +
  # Add labels for popular players of presnt data
  geom_text(data = player_standarized %>% 
            filter(player_name %in% c("Lionel Messi", "Cristiano Ronaldo", "Virgil van Dijk")), 
              aes(label = player_name),
              alpha = 1,
              hjust = -0.1, 
              hjust = 0.5, 
              size = 8, color = "black") +
  # Color the points for Messi and Ronaldo
  geom_point(data = player_standarized %>% 
            filter(player_name %in% c("Lionel Messi", "Cristiano Ronaldo", "Virgil van Dijk")), 
              aes(x = finishing, y = dribbling, z = strength, fill = player_class), 
              color = 'black', 
              shape = 21, 
              size = 3) +
  #manual fill color for the highlighted players
  scale_fill_manual(values = c("Advanced" = "#7f58AF", "Intermediate" = "#64C5EB", "Novice" = "#E84D8A"))+
  #remove the legend for fill
  guides(fill = FALSE) +
  #increase legend size
  theme(legend.text = element_text(size = 32))+
  #increase legend title
  theme(legend.title = element_text(size = 36))+
  #increase axis text size
  theme(axis.text = element_text(size = 32))+
  #remove the legend
  theme(legend.position = "none")

#save this as jpg using ggsave with name tern_plot
ggsave(plot = tern_plot, filename = "tern_plot.png", height = 5, width = 5)

#Include the ternary plot in the rmd
knitr::include_graphics("static_tern.png")

Distibution of median age among the European players

#select birthdate from the player attr data, and plot histogram
age_plot <- player_attr %>% 
  #convert the date to year
  mutate(birthday = as.Date(birthday)) %>% 
  mutate(birthday = as.numeric(format(birthday, "%Y"))) %>% 
  #calculate the age
  mutate(age = 2016 - birthday) %>% 
  ggplot(aes(x = player_class, y = age, color = player_class)) +
  geom_jitter(alpha = ifelse(player_attr$player_class == "Advanced", 1, 0.2), width = 0.3) +
  theme_minimal() +
  ylab("Age of the players in 2016") +
  scale_size_continuous(range = c(1, 10)) +
  scale_color_manual(values = colors)+
  #add geom boxplot over it to show the distribution
  geom_boxplot(aes(y = age, color = player_class), alpha = 0.4, width = 0.6, outlier.shape = 20)+
  # set the axis text size to 18
  theme(axis.text = element_text(size = 28)) +
  # set axis labels size to be 24
  theme(axis.title = element_text(size = 48)) +
  #remove axis title
  xlab("")+
  #increase the lgend size and text to 20
  theme(legend.text = element_text(size = 32))+
  #set title of legend to "Players Proficiency"
  labs(color = "Players proficiency") +
  #increase legend title
  theme(legend.title = element_text(size = 36))+
  #increase legend symbol size
  theme(legend.key.size = unit(1.2, "cm"))+
  #make the text white
  theme(text = element_text(color = "black"))

#save as ggplot
ggsave(plot = age_plot, filename = "age_plot.png", height = 5, width = 5)

#include the age plot in rmd
knitr::include_graphics("static_age.png")

Answer the following questions:

What challenges did you encounter or anticipate encountering as you continue to build / iterate on your visualizations in R? -I tried to inset the ternary plot into the main template, it keeps saying no third coordinates. I tried several approach but did not work. It seems like the tern plot does not fit into other base mainplot, considering it requires three coordinates for its mapping. Also, for both radar and ternary plots, text annotation does not seem to work. all annotation no matter what position specified always overlay the main graph. I tried changing its position to free space and but that features does not seem to work here. The another challenge is in understanding the other’s infographics template. I looked at the UFO template, and rerun all of them. When I tried copying same style, some function like min_max did not work. Also their output of each line looks so big in plot window, and in their final plot, everything looks perfect. This is extremely challenging in the beginning to understand how that is being resized.

What feedback do you need from the instructional team and / or your peers to ensure that your intended message is clear? - I mostly need feedback on how simple my graph is. Most times when I start working on a project, I keep adding all details and information that keeps coming in my head, and later i realiz its clustered. Last time I also did a mistake on not able to select the correct visualizations types for my research question. so, I also need feedback on if the type of plot i am presenting is one of the good ways to present the data.

#wort extension tools / packages do you need to use to build your visualizations? Are there any that we haven’t covered in class that you’ll be learning how to use for your visualizations? I need to use ggradar, magick, ggtern, glue, ggplot, etc in my visualizations. I think we have covered everything about it, at least we have already discussed the use case though we have not use the package directly. My learning goal here is to building inforgraphics template more than preparing different plot.

##infographics from previous plots #——continuation to HW4 and a part from Hw3

#specify text size and fonts
alien <- c('#47fcea', '#28ee85', '#17bd52', '#679d76', '#3e6f50', '#27593d')
txt <- alien[2]
bg <- 'black' # '#010101'
accent <- txt

font_add("fa-brands", regular = "ufo/fa-brands-400.ttf")
font_add("fa-solid",  regular = "ufo/fa-solid-900.ttf")
font_add_google("Orbitron", "orb")
font_add_google("Barlow", "bar")
showtext_auto()
ft <- "orb"
ft1 <- "bar"

# 🔡 text --------------------------------------------------------------------

mastodon <- glue("<span style='font-family:fa-brands; color:{accent}'>&#xf4f6;</span>")
twitter <- glue("<span style='font-family:fa-brands; color:{accent}'>&#xf099;</span>")
github <- glue("<span style='font-family:fa-brands; color:{accent}'>&#xf09b;</span>")
floppy <- glue("<span style='font-family:fa-solid; color:{accent}'>&#xf0c7;</span>")
space <- glue("<span style='color:{bg};font-size:1px'>'</span>")
space2 <- glue("<span style='color:{bg}'>--</span>")
caption <- glue("{mastodon}{space2}@sujan@{space}sujandon.org{space2}{twitter}{space2}@sujan{space2}{github}{space2}sbgithubhm/tidytues{space2}{floppy}{space2}European players attributes comparison")
# ## ---------------copy of UFO plot
g_base <- ggplot() +
  labs(
    title = "European Soccer Players \n attributes comparison based \n on Overall ratings 2016",
    subtitle = "Advanced players(>85 ratings), Intermediate players(70-85 ratings), Novice players(<70 ratings)",
    caption = caption
    ) +
  theme_void() +
  theme(
    text = element_text(family = ft, size = 36, lineheight = 0.3, colour = txt),
    plot.background = element_rect(fill = "white", colour = bg),
    plot.title = element_text(size = 128, face = "bold", hjust = 0.5, margin = margin(b = 10)),
    plot.subtitle = element_text(family = ft1, hjust = 0.5, margin = margin(b = 20), color = "#27593d"),
    plot.caption = element_markdown(family = ft1, colour = colorspace::darken(txt, 0.5), hjust = 0.5,
                                    margin = margin(t = 20)),
    plot.margin = margin(b = 20, t = 50, r = 50, l = 50),
    axis.text.x = element_text())

# # quote 1 for the distribution

quote1 <- ggplot() +
  annotate("text", x = 0, y = 1, label ="Median age of playesr betweeen \n all proficiency category is almost similar",
           family = ft1, colour = txt, size = 16, hjust = 0, fontface = "italic", lineheight = 0.4) +
  xlim(0, 1) +
  ylim(0, 1) +
  theme_void() +
  coord_cartesian(clip = "off")


quote2 <- ggplot() +
  annotate("text", x = 0, y = 1, label ="High rated players have greater agility,\n finishing, crossing and dribbling",
           family = ft1, colour = txt, size = 16, hjust = 0, fontface = "italic", lineheight = 0.4) +
  xlim(0, 1) +
  ylim(0, 1) +
  theme_void() +
  coord_cartesian(clip = "off")

#quote 3
quote3 <- ggplot() +
  annotate("text", x = 0, y = 1, label = str_wrap("Ternary plot is not rendering becuase it says it requires 3 axis. why would it require 3 axis if i want it bring it in the plan"),
           family = ft1, colour = txt, size = 16, hjust = 0, fontface = "italic", lineheight = 0.4) +
  xlim(0, 1) +
  ylim(0, 1) +
  theme_void() +
  coord_cartesian(clip = "off")

Convert the tern plot to raster object to load into the base plot.

# load the image and rasterize it
library(png)
image <- readPNG("tern_plot.png")
#conver this to raster object
image <- as.raster(image)
# # Combine the plots into a single infographic
g_final <- g_base +
  #age
  inset_element(age_plot, left = 0, right = 0.6, top = 1, bottom = 0.66) +
  #insert radar plot
  inset_element(image, left = -0.01, right = 0.8, top = 0.6, bottom = 0) +
  # avoid overlay
  inset_element(radar_plot, left = 0.5, right = 1.10, top = 0.8, bottom = 0.4) +
  #insert ternary plot
  #insert quote 1
  inset_element(quote1, left = 0.7, right = 1, top = 0.8, bottom = 0.72) +
  #Insert quote 2
  inset_element(quote2, left = 0, right = 0.5, top = 0.6, bottom = 0.5) +
  #insert age description
  inset_element(quote3, left = 0.7, right = 1, top = 0.2, bottom = 0) +
  plot_annotation(
    theme = theme(
      plot.background = element_rect(fill = "white", colour = 'white')))

ggsave(plot = g_final, filename = "infographics_draft.png", height = 16, width = 10)
#load the ggsave image as png in the plot window
knitr::include_graphics("static_infographics.png")